82 research outputs found

    Soliton generation in CaF2_2 crystalline whispering gallery mode resonators with negative thermal-optical effects

    Full text link
    Calcium fluoride (CaF2_2) crystalline whispering gallery mode resonators (WGMRs) exhibit ultrahigh intrinsic quality factors and a low power anomalous dispersion in the communication and mid-infrared bands, making them attractive platforms for microresonator-based comb generation. However, their unique negative thermo-optic effects pose challenges when achieving thermal equilibrium. To our knowledge, our experiments serve as the first demonstration of soliton microcombs in Q > 109 CaF2_2 WGMRs. We observed soliton mode-locking and bidirectional switching of soliton numbers caused by the negative thermo-optic effects. Additionally, various soliton formation dynamics are shown, including breathing and vibrational solitons, which can be attributed to thermo-photomechanical oscillations. Thus, our results enrich the soliton generation platform and provide a reference for generating solitons from WGMRs that comprise other materials with negative thermo-optic effects. In the future, the ultrahigh quality factor of CaF2_2 crystal cavities may enable the generation of sub-milliwatt-level broad-spectrum soliton combs.Comment: 4 pages,5 pictures,description of soliton generation in a calcium fluoride whisper gallery mode microresonators with negative thermo-optical effect,ready for publication in optics lette

    Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

    Full text link
    We revisit the estimation bias in policy gradients for the discounted episodic Markov decision process (MDP) from Deep Reinforcement Learning (DRL) perspective. The objective is formulated theoretically as the expected returns discounted over the time horizon. One of the major policy gradient biases is the state distribution shift: the state distribution used to estimate the gradients differs from the theoretical formulation in that it does not take into account the discount factor. Existing discussion of the influence of this bias was limited to the tabular and softmax cases in the literature. Therefore, in this paper, we extend it to the DRL setting where the policy is parameterized and demonstrate how this bias can lead to suboptimal policies theoretically. We then discuss why the empirically inaccurate implementations with shifted state distribution can still be effective. We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization. Specifically, we show that a smaller learning rate, or, an adaptive learning rate, such as that used by Adam and RSMProp optimizers, makes the policy optimization robust to the bias. We further draw connections between optimizers and the optimization regularization to show that both the KL and the reverse KL regularization can significantly rectify this bias. Moreover, we provide extensive experiments on continuous control tasks to support our analysis. Our paper sheds light on how successful PG algorithms optimize policies in the DRL setting, and contributes insights into the practical issues in DRL.Comment: 12 pages, 9 figure

    Towards a compact soliton microcomb fully referenced on atomic reference

    Full text link
    A fully stabilized soliton microcomb is critical for many applications of optical frequency comb based on microresonators. However, the current approaches for full frequency stabilization require either external acousto-optic or electro-optic devices or auxiliary lasers and multiple phase-locked loops, which compromises the convenience of the system. This study explores a compact atomic referenced fully stabilized soliton microcomb that directly uses a rubidium atomic optical frequency reference as the pump source, and complements the repetition rate (7.3 GHz) of the soliton microcomb was phase-locked to an atomic-clock-stabilized radio frequency (RF) reference by mechanically tuning the resonance of the optical resonator. The results demonstrate that the stability of the comb line (0.66 THz away from the pump line) is consistent with that of the Rb87 optical reference, attaining a level of approximately 4 Hz @100 s, corresponding to the frequency stability of 2E-14 @100 s. Furthermore,the frequency reproducibility of the comb line was evaluated over six days and it was discovered that the standard deviation (SD) of the frequency of the comb line is 10 kHz, resulting in a corresponding absolute deviation uncertainty of 1.3E-10, which is technically limited by the locking range of the soliton repetition rate. The proposed method gives a low-power and compact solution for fully stabilized soliton micorcombs.Comment: 6 pages, 5 figure

    Aortic valve morphology and paravalvular leak regression after a self-expandable transcatheter aortic valve replacement

    Get PDF
    Aims: The study aimed to compare paravalvular leak (PVL) changes after a transcatheter aortic valve replacement (TAVR) with self-expandable prosthesis between different aortic valve morphologies and evaluate the impact of paravalvular leak regression on clinical prognosis.Methods: Patients with aortic stenosis (AS) successfully treated with a self-expandable TAVR who were followed up for at least 1 year at our centre were consecutively enrolled from January 2016 to August 2019. Paired serial changes in paravalvular leak and other haemodynamic parameters by echocardiography were collected and compared between the bicuspid valve (BAV) and tricuspid aortic valve (TAV). A logistic regression model was used to explore the predictors of paravalvular leak regression (<1 grade) 1 year after transcatheter aortic valve replacement, while its impact on subsequent clinical outcomes (all-cause mortality and rehospitalisation for heart failure (HF)) was further evaluated using Kaplan–Meier analysis.Results: A total of 153 bicuspid valve and 114 tricuspid aortic valve patients were finally enrolled; haemodynamic parameters and paravalvular leak severity were comparable before the discharge between the two groups. The peak transaortic velocity, mean transvalvular gradient, and effective orifice area all significantly improved (p < 0.05) without intergroup differences at all follow-up timepoints. Significant paravalvular leak reduction was observed only in the TAV group (1.75% vs. 4.39%, p = 0.029), while moderate paravalular leak was still more prevalent in BAV (7.19% vs. 1.75%, p = 0.041) at the 1-year follow-up. Multivariable analyses identified the bicuspid valve, asymmetric calcification, and undersizing as independent predictors of failure of the 1-year paravalvular leak reduction in patients with mild or moderate paravalvular leak after discharge. Patients without a paravalvular leak reduction within 1 year showed a relatively higher 2-year all-cause mortality and HF (HR: 5.994, 95% CI: 1.691–21.240, and p = 0.053) rates thereafter.Conclusion: In AS patients after self-expandable transcatheter aortic valve replacement, paravalvular leak regression within 1 year was less prevalent in bicuspid valve morphology. The failure of paravalvular leak reduction might lead to an increased risk of poorer prognosis in the long run

    Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

    Full text link
    We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top professional human players in full 1v1 games.Comment: AAAI 202
    • …
    corecore